Goto

Collaborating Authors

 noisy sample


Benign Overfitting in Single-Head Attention

Neural Information Processing Systems

The phenomenon of benign overfitting, where a trained neural network perfectly fits noisy training data but still achieves near-optimal test performance, has been extensively studied in recent years for linear models and fully-connected/convolutional networks. In this work, we study benign overfitting in a single-head softmax attention model, which is the fundamental building block of Transformers. We prove that under appropriate conditions, the model exhibits benign overfitting in a classification setting already after two steps of gradient descent. Moreover, we show conditions where a minimum-norm/maximum-margin interpolator exhibits benign overfitting. We study how the overfitting behavior depends on the signalto-noise ratio (SNR) of the data distribution, namely, the ratio between norms of signal and noise tokens, and prove that a sufficiently large SNR is both necessary and sufficient for benign overfitting.








LearningwithNoisyCorrespondence forCross-modalMatching

Neural Information Processing Systems

In practice, however, such an assumption is extremely expensive even impossible to satisfy. Based on this observation, we reveal and study alatent and challenging direction in cross-modal matching, named noisy correspondence, which could be regarded as a new paradigm of noisylabels.